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ABSTRACT 

We describe techniques for comparing spectra extracted from cosmological 
simulations and observational data, using the same methodology to link Lyman- 
a properties derived from the simulations with properties derived from observa- 
tional data. The eventual goal is to measure the coherence or clustering properties 
of Lyman-o; absorbers using observations of quasar pairs and groups. We quantify 
the systematic underestimate in opacity that is inherent in the continuum fitting 
process of observed spectra over a range of resolution and SNR. We present an 
automated process for detecting and selecting absorption features over the range 
of resolution and SNR of typical observational data on the Lyman-a "forest" . 
Using these techniques, we detect coherence over transverse scales out to 500 H^q 
kpc in spectra extracted from a cosmological simulation at z — 2. 

Subject headings: intergalactic medium — large-scale structure of the universe 
— methods: data analysis, N-body simulations — quasars: absorption lines 

1. INTRODUCTION 

The numerous lines of Lyman-a absorption that appear in the spectra of quasars are 
proving to be excellent cosmological probes. Hydrodynamic simulations of the universe reveal 
an evolving network of sheets, filaments and halos caused by complex gravitational dynamics 
in the expanding universe. The Lyman-a "forest" only identifies the neutral component of 
the intergalactic medium, but the absorbers are well understood tracers of the overall mass 
distribution (Hernquist et al. 1996; Miralda-Escude et al. 1996; Rauch 1998). The simulations 
show that the physical state of the diffuse gas causing Lyman-a absorption is relatively 
simple, enabling the use of analytic models to relate the absorption to the underlying mass 
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and velocity fields (McGill 1990; Bi 1993; Gnedin & Hui 1996). The combination of theory 
and simulation has allowed the Lyman-a forest to be used for measurements of the baryon 
fraction (Rauch et al. 1997; Weinberg et al. 1997), the mass density (Weinberg et al. 1999), 
the amplitude of mass fluctuations (Gnedin 1998; Nusser & Haehnelt 2000), the mass power 
spectrum (Croft et al. 1998, 1999, 2001; McDonald et al. 2000), the thermal history of the 
IGM (Haehnelt & Steinmetz 1998; McDonald et al. 2000; Ricotti, Gnedin, & Shull 2000; 
Schaye et al. 2000), the chemical evolution of the Universe (Gen & Ostriker 1999; Aguirre et 
al. 2001,?), and the metallicity of the IGM (Rauch et al. 1997; Hellsten et al. 1997; Dave et 
al. 1998). 

On the observational side, the detection of coincident absorption lines in the spectra 
of quasar pairs provided the first evidence that Lyman-a absorbers have a large transverse 
extent (Dinshaw et al. 1994; Bechtold et al. 1994; Dinshaw et al. 1998; Crotts & Fang 1998). 
In principle, this idea can be extended with the use of quasar groups to thread a contiguous 
volume and provide 3D "tomography" of the absorbers, revealing the variation of coherence 
and homogeneity with redshift. In practice, however, quasars of the appropriate brightness, 
redshift, and angular separation are hard to find. Moreover, at ^ < 2, the opacity drops 
steadily and the Lyman-a forest thins out into a savannah, in agreement with numerical 
simulations (Dave et al. 1999). Over most of the Hubble time, the absorbers can only be 
observed in the vacuum ultraviolet with the twenty times smaller collecting area of the 
Hubble Space Telescope (HST). Studies of the low redshift Lyman-a forest are still very 
much limited by the quality of the available data. 

Figure 1 summarizes the state of observational capabilities for studies of the Lyman-cc 
forest. The boxes give the approximate bounds in resolution and SNR of several different 
instrument and telescope combinations. Open boxes show examples of ground-based facilities 
that can only measure Lyman-a, at ^ > 1.6; shaded boxes show past, present and future 
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HST instruments. The horizontal bar shows the range of observed Doppler parameters of the 
absorbing gas, which is independent of redshift (Penton, ShuU, & Stocke 2000). Only cchelle 
observations are sensitive to the thermal and hydrodynamic properties of the absorbing gas. 
The best quality data come from Keck/HIRES and VLT/UVES. At lower redshift, the STIS 
echelle mode will be supplanted by the Cosmic Origins Spectrograph (COS). There is a large 
jump in resolution down to the level (1-2 A) where a larger number of targets is available 
and a larger redshift path length can be surveyed. The Keck/LRIS region is typical of most 
moderate resolution ground-based data. The volume of existing data will be eclipsed by 
the ~ 10^ high redshift quasar spectra that will emerge from the Sloan Digital Sky Survey 
(York et al. 2000). At lower redshift, the largest single data set comes from the HST Quasar 
Absorption Line Key Project (Jannuzi et al. 1998; Weymann et al. 1998). 

The goals of this paper are to (a) define a set of automated procedures that can be used 
to analyze data (with a wide range of spectral resolution, SNR, and physical separations 
between paired lines of sight) and spectra extracted from simulations, (b) form a bridge 
between the historically different methodologies of observers and simulators, and (c) facilitate 
the comparison between observations and simulations of the Lyman-o; forest, in order to 
derive cosmological constraints. 

The major challenge to the first goal is the fact that absorption features in a quasar 
spectrum must be measured with respect to an unknown continuum, and in the presence 
of broad emission features. This is usually accomplished with a low-order function fitted to 
the continuum, but no procedure is fully robust in dealing with broad absorption features or 
the region just blue-ward of the Lyman-a emission line. Even the comprehensive software of 
the HST Key Project cannot automatically deblend complex spectral regions (Schneider et 
al. 1993). By contrast, spectra extracted from the hydrodynamic simulations yield opacities 
directly with respect to a predetermined continuum. In addition, observers have traditionally 
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treated absorption in terms of discrete features modeled by Voigt profiles, which in turn must 
be convolved with the instrumental response function. This approach is observationally 
successful, particularly at low redshift where spectral features have low number density 
and appear isolated in the spectra. Regardless of whether or not a Voigt profile provides a 
suitable description of the absorbing gas, there is evidence from extracted simulation spectra 
that much of the opacity is associated with shallow and smoothly varying absorption. This 
is commonly referred to as the fluctuating Gunn-Peterson approximation (Hernquist et al. 
1996; Weinberg, Katz, & Hernquist 1998). 

With regard to the second goal, there are substantial differences between the extraction 
of information from optical or UV spectra, and the the extraction of information from a 
spectrum created as a product of a cosmological simulation. Observational spectra have 
finite SNR and resolution, each one spans a large 1-dimensional redshift range, and the 
Lyman-CK lines must be identified in spectral regions contaminated with metal lines from 
higher column density absorbers. Details of the method for culling out metal lines from the 
Lyman-a forest will be given in Paper II. Spectra extracted from hydrodynamic simulations 
have no noise and extremely high resolution, although the spectral features result from gas 
kinematics that map onto velocity space in a complex way. By contrast to the ~ Gpc or 
larger ID path of the Lyman-cu forest in a quasar spectrum, the simulation boxes span 20-40 
h^Q Mpc and are aliased for large scale structure measurements on scales larger than 5-10 
h'^Q Mpc. However, the simulations are ideal for 3D measures of the gas distribution since 
~ 10^ independent spectra can be extracted, whereas 3D observations of absorber structure 
are limited by the rarity of suitable quasar pairs. 

The third goal of this project will be realized with the second and subsequent papers 

in this scries. In the second paper, we will compare coherence measures of absorbers at 
z 2 with extractions from simulations. In future papers, we will extend the work to lower 
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redshifts and investigate the dependence of the predicted absorber properties on cosmological 
parameters. This paper sets the stage by describing the techniques for automated and 
consistent comparisons between observations and simulations of the Lyman-a forest. In 
section 2, we discuss the detection and selection of absorption features. In section 3, we 
discuss the measurement of physical properties of the absorbers. In section 4, we discuss 
results from single and multiple sight-lines, including a comparison between line counting 
and measures that use the entire information contained in the spectra. Section 5 contains a 
brief summary. 

2. ABSORPTION LINE SELECTION 

2.1. Extracting Spectra from the Simulation 

A numerical simulation oi a z — 2 cold dark matter (CDM) universe, with Hq — 50 km 
s^^ Mpc~^ , normalized to yield a present-day rms mass fluctuation of cxie = 0.7 in spheres 
of radius 16 h^Q Mpc, was performed by Hernquist et al. (1996) using a method based 
on smoothed-particle hydrodynamics, or SPH (Hernquist & Katz 1989; Katz, Weinberg & 
Hernquist 1996). The simulation volume is a periodic cube of comoving size 22.2 h^Q Mpc 
drawn randomly from a CDM universe with fl^n = 1 and baryon density Qi, = 0.05. The 
simulation includes the effects of a uniform photoionizing radiation field, where radiative 
heating and cooling rates are computed assuming optically thin gas in ionization equilibrium 
with this field. There are 64^ SPH particles and an equal number of dark matter particles; 
the masses of the individual particles are 1.45 x 10^ Mq and 2.8 x lO^M©, respectively. The 
gas resolution varies from ~ 5 kpc in the highest density regions to ~ 200 kpc in the lowest 
density regions. Although more recent simulations have been performed with higher spatial 
resolution and with the parameters of the currently favored cosmological model {Hq = 70 
km s~^ Mpc~^ , — 0.35, A = 0.65), most of the issues that relate to comparisons with 
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observations can be illustrated with this single simulation. 

Spectra are extracted from the simulation cube by computing the imprint of the density, 
temperature, and velocity of the neutral gas fraction associated with each SPH gas particle 
on a fiat continuum (for more details, see Katz et al. 1996). One thousand sets of spectra 
were extracted from the simulation cube by randomly selecting 1000 positions from the x- 
projection. For each of the 1000 primary lines of sight (PLOS), six adjacent spectra were 
extracted that are offset 33.3, 100, 233, 400, 667, and 1000 h^^ kpc in proper distance from 
each PLOS. The azimuth angle of the adjacent spectra was varied randomly from one primary 
sightline to another. This strategy allows the examination of coherence of the absorption 
features in the extracted spectra over these (and other intermediate) transverse separations, 
in addition to enabling comparisons with observational data for quasar pairs having a range 
of separations. The transverse separations between the lines of sight range from less than 
the resolution of the simulation to a level where absorber coherence is expected to be very 
weak. 

Each spectrum is sampled with 1000 data points spanning 1924.5 km s~^ , which corre- 
sponds to a spectral range of 23.4 A at z = 2. Figure 2 shows six spectra extracted from the 
simulation illustrating the range of absorption features, with the mean opacity increasing 
going from panel (a) to (/) . Panels (c) and (d) have a typical opacity for extracted spectra 
at this redshift, and less than 1% of the sight-lines are as heavily absorbed as the example 
in panel (/). 

2.2. Degrading the Extracted Spectra 

One objective of this paper is to compare absorber properties measured from the ob- 
served spectra to those measured from the spectra extracted from the simulation. For a fair 
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comparison to be made, the extracted spectra are processed to reflect the inevitable limita- 
tions of observational data due to instrumental effects of noise and resolution. The spectra 
are extracted from the simulation cube in terms of optical depth as a function of velocity, 
and are converted to transmission, T — e~'^, as a function of wavelength in Angstroms, 
= AL2/a(l + z){l + Vi/c), where z ^ 2. 

As can be seen in Figure 1, the bulk of the UV data on the Lyman-a forest, which 
contains information about the IGM over the last three-quarters of the age of the universe, 
is of rather modest SNR and resolution. This means that the complex dynamical processes 
that imprint themselves on the absorber profile are not visible at low redshift, even in the 
few published echelle spectra (Penton, Stocke, & ShuU 2000; Tripp, Savage, & Jenkins 2000). 
With non-echelle spectra, the spectral profile is dominated by the (typically) Gaussian shape 
of the instrumental profile, and Doppler parameters cannot be measured. For the purposes 
of this paper and the following Paper II, we chose a range of values for the resolution, F^es, 
and SNR typical of most published quasar spectra. To span the parameter space of available 
data at any redshift we chose SNR = 10, 30, and 100, and Tres= 5, 20, 80, and 300 km s~^, 
and form twelve realizations of a single spectrum extracted from the simulation, each with 
a different combination of SNR and resolution. 

The extracted spectra are convolved with a Gaussian profile having a FWHM, F^es, 
with each of the four values chosen to represent the instrumental resolution of the available 
observational data. The velocity resolution of the extracted spectra (~ 2 km s^^) is much 
higher than the resolution of observed quasar spectra, the range of which is represented by the 
four values 5, 20, 80 & 300 km s"^. We resample the extracted spectra to 3 per resolution 
element by spline interpolation. Poisson noise is added to each spectrum by scaling the 
intensity at each pixel to the {SNRy, adopting a Poisson deviate with that value as the 
mean, and then renormalizing the spectrum. 
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Figure 3 shows two different extracted spectra eacfi fiaving an opacity similar to tfie 
mean at z = 2, exemplifying tlie typical absorption morphologies at that redshift. The 
twelve realizations for each individual extracted spectrum show the degraded appearance 
due to the various choices of SNR and resolution. The highest resolution, highest SNR 
spectrum (top right panel in each case) is a fairly close match to the original spectrum 
extracted from the simulation. The best published spectra from the Keck telescopes appear 
like those in the top right panels, but there are a significant number of HST spectra that 
are similar in quality to the lower left panels. These plots clearly show the separate effects 
of SNR and resolution on the appearance of the spectrum. 



2.3. Fitting Continua to the Extracted Spectra 

The measurement of absorption features in quasar spectra, whether by line profile fitting 
or by using a type of flux statistic, must be performed relative to the quasar continuum. 
Continuum fitting is generally done by iteratively fitting the data points in a spectrum, 
eliminating negative or downward excursions from the fit, and refitting. The position of the 
final adopted continuum depends on the SNR of the spectrum and, more importantly, on 
the total flux absorption (which is directly related to the line density) and its distribution 
across the spectrum. Such a procedure only approaches the "true" continuum in the limit of 
infinite signal to noise, and even then has the potential to miss weak or shallow absorption 
features. Figure 3 demonstrates how features that can be seen clearly in the highest SNR, 
highest resolution spectra are lost in lower SNR, lower resolution spectra. 

Given that there are no regions of truly zero absorption, it is essentially guaranteed that 
line and continuum fitting will miss some portion of the Lyman-a opacity. The systematic 
underestimation of the continuum level will also tend to underestimate the equivalent width 
and therefore the column density of fitted lines too. We use extracted spectra from the 
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simulation to calibrate the size of this effect, and measure it as a function of SNR and 
resolution. 

With many thousands of spectra to process, an automated procedure is required. After 
the simulation spectra are manipulated to mimic observational data, the continuum must 
be estimated as for the observed spectra. Any algorithm must be simple conceptually and 
should be able to fit continua automatically for spectra with a range in absorber opacity 
corresponding to a redshift range of < 2; < 3, and for spectra having a range of signal-to- 
noise ratios, 10 < SNR < 100. Any method implicitly assumes that the data retains some 
of the quasar's emission continuum, and this implies that continua fit to highly absorbed 
regions will be less representative of the true continuum. Mean absorption opacity and 
SNR are the primary factors in determining how well the true (or input) continuum can be 
recovered. 

Fitting a cubic spline is a standard method of estimating continua for observed spectra 
(Schneider et al. 1993). Splines offer the advantage of being able to fit a smooth, continu- 
ous function through a large number of data points. In quasar spectra, the large available 
wavelength range acts to define a smooth, slowly-varying continuum. However, the simula- 
tion volume offers only a short spectral range, and spline fitting to short extracted spectral 
segments with significant absorption can produce steeply-sloped continua. We suppress this 
effect in the case of the extracted spectra by "tripling" the spectrum, or lining up three 
identical spectra, taking advantage of the fact that the simulation cube is periodic. This is 
merely a practical mechanism for dealing with the edge effects in continuum and absorption 
line fitting — for the analysis, only lines whose centers fall in the central third of the new 
spectrum (the original extracted spectrum) are included. 

To estimate a continuum computationally, representing what one might fit "by eye," we 
implement a two step process. The first step is to iteratively fit a straight line to 23 A sections 
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of a spectrum (corresponding to the independent spectral segment length for the extracted 
spectra), providing an estimate of the amplitude of the continuum over that wavelength 
range. This is a reasonable way to proceed because emission continua of quasars can be 
closely approximated by series of linear segments over ~ 20 A scales, except in the vicinity 
of Lyman-a emission. All data points are weighted equally and an iterative process rejects 
points that deviate negatively by 2cr from the current fit, excluding them from subsequent 
fits. The rejection process converges after only four cycles. 

The method of continuum fitting used by the HST Absorption Line Key Project divides 
a spectrum into bins and fits a spline to the average flux in these bins. However, this bin 
size is larger than the length of our extracted simulation spectra, so the first step described 
above provides our first estimate to the continuum. The second step produces a smoothly 
varying continuum over the length of the spectrum using the amplitude of each of the fitted 
fine segments from the first step. A single point is created for each of the three segments 
by evaluating the amplitude of the fitted straight line at the average wavelength of each 
segment. A cubic spline is fit through these points for the length of the spectrum. Visual 
inspection of spectra with a range of fiux decrements demonstrates that this two step method 
results in a more satisfactory fit to the continuum than the Key Project method, especially 
for heavily absorbed spectra, as the first step has the effect of "fioating" the continuum to 
what appears "by eye" to be a more appropriate level. 

To evaluate how well the algorithm works for the observational spectra, we compare the 
final fitted continua using the new two-step process to the continuum fit using the method 
used by the HST Absorption Line Key Project. Some manual adjustment of the continuum 
is necessary in the vicinity of the steeply sloped Lyman- o; emission features. Four of the six 
spectra to be analyzed in Paper II of this series — PG 1343+264A,B, LB 9605, and LB 9612 
— were studied using both methods, and the ratio of the two fitted continua is unity at 
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all wavelengths to well within the errors. This demonstrates that no significant systematic 
effects are introduced by the continuum fitting algorithm. We can also infer that absorption 
line parameters returned by the two methods will not significantly differ. 

To show how SNR and resolution affect the level of the fitted continuum, we fit continua 
using our method to 1000 randomly-selected spectra extracted from the simulation. We 
then compare the mean opacity computed for each of 1000 undegraded (i.e. raw) spectra 
extracted from the simulation to the mean opacity calculated relative to the fitted continuum 
for each degraded spectra in each of the 12 realizations of SNR and r^es- Figure 5 shows 
the distribution of the difference of these two quantities for all 1000 spectra and for each 
realization. This difference is equivalent to the difference between the input continuum level 
(unity, by definition) and the fitted continuum level, converted into an opacity decrement. 
At. 

An opacity decrement is the unavoidable consequence of any fitting procedure where the 
continuum is not known a priori — as is the case for all observational data. The quantity 
At therefore represents "lost" opacity, and it has two components. The first is due to the 
fact that strong absorbers will have their column densities slightly underestimated owing to 
the lower placement of the continuum. The second is due to the sum of broad and/or low 
level absorption that cannot be recovered by fitting a noisy continuum (the Gunn-Peterson 
approximation). See Figure 2d for a good example. 

Figure 4 shows how the severity of underestimating opacity depends on both SNR and 
resolution by comparing the mean opacity measured for the undegraded spectra to the mean 
opacity measured relative to the fitted continuum for the degraded spectra. In Figure 5, 
we plot the distribution of the difference of these two quantities. At. At the level of the 
highest quahty echelle data (top- right panel), 80-90% of the absorption opacity is recovered 
in almost every case. At the level of poor quality HST data (lower-left panel), the opacity 
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underestimation is typically 50%. The tail to high At in each panel represents the relatively 
rare cases of heavily absorbed spectra, where no automated procedure can adequately recover 
the absorption. 

3. ABSORPTION LINE MEASUREMENT 

3.1. Line Measurement and Deblending 

We have examined the general effects of a wide range of resolution and SNR on the 
appearance of the spectra and on the ability to recover the true continuum level. For the 
phase of absorption line measurement, we home in on a narrower range of parameter space 
with the four combinations: SNR = 10 and 30, and resolution rres= 80 and 300 km s~^. It 
can be seen from Figure 1 that these values bracket most existing HST data and much of 
the anticipated data from the Sloan quasar survey. This range also encompasses the data we 
will use in Paper II for a Lyman-a coherence measurement at z = 2. Despite the conceptual 
limitations of the line/continuum fitting paradigm, we acknowledge the fact that a huge 
amount of observational data has been published using this type of procedure. A major 
goal of this paper is to form a bridge between the appropriate techniques used to analyze 
simulations and the traditional methods of quasar spectroscopy. 

3.1.1. The Absorption Line Profile 

Available quasar pair data on the Lyman-a "forest" have SNR and resolution generally 
no better than SNR — 30 and Tres— 80 km s~^. This corresponds to the highest SNR and 
resolution of the four parameter spaces for which we are measuring absorption lines in the 
extracted spectra. Dectection of the weakest lines is limited by the SNR and the resolution 
of the data, and measurement of the column density depends on the Doppler parameter. 
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However, integration of the evolution function for Lyman-a absorbers shows that less than 
1% of absorbers are expected to have column densities higher than logN = 15 (Scott, 
Bechtold, & Dobrzycki 2000). The range of Doppler parameters for Lyman-a absorbers at 
2; ~ 2 spans approximately 20-80 km s"^ (Hu et al. 1995). For an absorber with a median 
Doppler value b — 30 km s~^, logN — 15, and r^es^ 80 km s~^, the hne profile is just 
saturated or reaches zero flux at the line center. For higher resolution data, line-fitting 
methodologies represent individual absorbers with a Voigt profile, but we demonstrate that 
for column densities less than logN — 15 and resolutions lower than Tres— 80 km s~^, the 
use of a Gaussian profile is justified. 

We generate the fiux profile for a Lyman-cc absorption line (with essentially infinite 
SNR and resolution) using subroutines from the program AutoVP (Dave et al. 1997) for 
the case of log = 15 and 6 = 80 km s~^. The convolution of this "intrinsic" line profile with 
the instrumental line spread function (a Gaussian with F^es) is the expected fiux line profile. 
Sampling of the dispersion is chosen to mimic that of the degraded simulation spectra for 
this parameter space. 

One thousand absorption lines were created and Poisson noise was added to represent 
SNR = 30. A Gaussian profile was fitted to each Monte Carlo simulated line. This fitted 
Gaussian profile was compared to the Monte Carlo line profile, yielding an average value of 
0.011. For 50 degrees of freedom (the number of data points in the profile), this corresponds 
to a probability of less than 10~^ that the two profiles are different. We also note that low 
resolution (Fj.es> 80 km s~^) renders undetectable all the complexities in the absorber profile 
caused by hydrodynamic effects. The use of a Gaussian profile in fitting absorption features 
is therefore adequate for our immediate purposes, and for our initial science application in 
Paper 11. 
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3.1.2. Line Selection Algorithm 

Absorption line selection and measurement is performed using software originally writ- 
ten by Tom Aldcroft (Aldcroft 1993) that has since been substantially revised. The most 
recent version of the software retains the original code structure, the graphic interface tools, 
and the fundamental numerical subroutines. New methodology for fitting continua, and 
new algorithms for selecting and fitting absorption features have been implemented. The 
continuum-fitting process is described in the previous section. Line selection and fitting is a 
two-step process where a preliminary list of lines is selected, which forms a first estimate to 
the simultaneous fit that is determined in a subsequent step. The line selection and fitting 
methodology was first described in Petry, Impey, & Foltz (1998). Some refinements to the 
line fitting algorithm have been added to generalize finding the techniques for a range of 
data quality and to eliminate manual intervention. The range of data quality addressed in 
this work is SNR = 10 to 30 and T^es^ 80 to 300 km s'^ 

Selecting a combination of absorption lines profiles to fit an absorption region is a 
difficult because of the effects of instrumental resolution and SNR on the true features in 
the spectrum. The second extraction in Figure 3 shows an example (at ~ 3662 A) where 
lines that are easily resolved at 20 km s~^ cannot be separated at a resolution 80 km s~^. 
The SNR is not as important a factor in concealing the structure of an absorption feature 
until it becomes very low. No procedure can recover information that is lost due to limited 
instrumental resolution, but Gaussian profile fitting does provide a fair comparison between 
the simulation spectra and observed spectra. Even if a Gaussian is a good approximation to 
the shape of strong absorption features, we do not expect that line-fitting can yield unique 
fits. The biggest danger is expected to be over-fitting, such as when a single broad feature is 
fit by several overlapping Gaussians. However, it is also possible to under-fit, such as when 
noise causes a weak feature to fall below the significance threshold for a single unresolved 
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line. In what follows, we describe a robust procedure for dealing with most of the situations 
presented by real quasar spectra. 

One simple way of fitting a number of Gaussian profiles to a predefined absorption region 
is to fit the peak of a Gaussian to the lowest flux point in the spectrum, subtract the proflle 
from the spectrum, and then repeat the process at each local minimum. The method is 
considered to have converged when the result is consistent with the flux error of the original 
spectrum. However, with this method, a region which may be better flt in a sense by two 
barely resolved lines, will actually be fit by one strong line in the center, possibly with an 
unphysically large width, straddled by two smaller "satellite" lines. The sequential approach 
has the basic problem that lines are defined one at a time, which constrains the choice of 
subsequent lines, and does not ensure an optimal or unique fit. 

We have devised an iterative method to fit the maximum number of lines to a region 
and distribute the lines optimally across the region. Unresolved lines are assumed, i.e. the 
absorber is well approximated by the instrument line profile (a Gaussian). Subsequently, all 
combinations of lines from this preliminary list are fit simultaneously to the region using a 
Marquardt minimization technique, and the fit that meets a preset criteria is chosen as the 
best fit. We consider that the best fit is the deepest depression in the surface of solutions. 
The Marquardt minimization does not allow for large changes in the line center or line width; 
by design, the preliminary step of finding lines and subsequent testing each combination of 
lines samples the surface of solutions well enough to find the best solution. This is justified 
in practice because the iterative procedure rarely leads to substantial changes in the centers 
and widths of the strongest features. 

In the first phase of line selection, an iterative search is made for all minima in the 
data array containing the fiux values for each spectrum that has been convolved with the 
instrumental profile. A 'primary line list is formed from minima found in the first pass and a 
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secondary list is made from minima found in the convolved array after lines in the primary 
list are subtracted from the original data. The primary and secondary lists combine to 
form the preliminary line list that defines an initial "best guess" at the final line list. The 
rationale for combining primary and secondary lists is as follows. The minima in the data 
array locate the strongest absorption features, but these inflection points do not account for 
all the absorption in a blended region. The secondary list serves to "interleave" the primary 
list, and combination of the two yields a fairly complete and robust map of the absorbers. 

The iterative method for initially locating and measuring lines in the spectrum of each 
component follows that described in detail by Paper II of the HST Quasar Absorption 
Line Key Project (Schneider et al. 1993). The equivalent width, W, at every pixel in each 
spectrum is computed by centering the spectral line proflle on the i*'* pixel and performing 
the weighted sum over the 6a limits of the Gaussian forming the array Wi. Similarly, this is 
done for the error array, aWi, and the interpolated error array, aWi. The interpolated error 
array is calculated by replacing flux errors for data points that deviate negatively by more 
than 2a by the average of the errors in up to five (on each side) of the adjacent continuum 
points. Continuum in this sense is defined by points lying within a 2a deviation from the 
fitted continuum. This array differs from the error array in that the errors at the centers of 
the absorption lines are increased slightly to reflect the noise in the adjacent true continuum. 
The SNR of an unresolved line is conventionally defined as Wi/aWi, whereas we choose to 
define "significance" as Wi/aWi, which is used to select prehminary lines in the first phase. 
This correction for the drop in the errors at the centers of absorption lines, where the fiux 
errors are smaller, allows for a more uniform comparison of relative line strengths. 

The primary line list is generated by identifying the minima in the Wi array within a 

window of width one half the instrumental resolution that has a significance (Wi/aWi) of 
three or more. A parabolic fit through the two adjacent points identifies the line center and 
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the value of Wi and aWi corresponding to this wavelength give the equivalent width and 
error. The profiles of the lines are calculated and subtracted from the original spectra. This 
subtracted spectrum is then convolved and a secondary line list was derived in the same way. 
The secondary list is then subtracted from the original spectrum and a pass was made again 
that refits only the primary lines and may add new lines or drop existing lines. The number 
of lines added by the secondary list increases by a factor of 1.5 to 2 times the number of 
lines found in the primary list. 

It was found empirically that after approximately five iterations of finding lines for first 
the primary and then the secondary lists, the number of lines converged to a repeating series 
as the hne centers shifted about slightly depending on the noise. The number of hues that 
are added or lost by all subsequent iterations is no more than a few percent of the total 
list. The preliminary line list was formed from the combination of the two lists, and if their 
profiles are subtracted from the original data array, the result has a mean flux of zero and 
variations entirely consistent with noise. 

3.1.3. Line Fitting Algorithm 

The primary list of lines is input as a first guess to the simultaneous best fit of each 
absorbed region in a spectrum. For practical reasons the spectrum is broken up into regions. 
There are several ways of doing this and we chose to step through each spectrum and define 
the region to be fit as beginning where the fiux array downward-crosses the continuum and 
ending where it upward-crosses the continuum. This method of defining the spectral regions 
is reasonable for data of resolution and SNR under consideration here, because the number 
of upward- and downward-crossings divides the spectrum into sections that can be fit by a 
moderate number of preliminary lines, and because it limits the number of regions where 
no preliminary lines are found. Fitting all combinations of the preliminary lines makes it 
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computationally impractical to fit more than 11 lines per region, so in the rare occasions 
that this situation occurs the region is split at the highest flux data point in the region, and 
the fit is performed for the new region with a smaller number of lines. 

Lines from the initial list that fall within the region to be fit were counted, n, and the 
number of all possible combinations of these lines, Ncomb, was computed by summing the 
binomial coefficient, C(^), which gives the number of combinations possible for n lines taken 
X positions at a time, over the number of possible positions, x — 1 . . .n 



The derived lines for each individual combination were worked out and saved, and each 
combination of lines was fit simultaneously to the region following a Marquardt minimization 
technique that varies the amplitude, center and width of the lines. The line parameters and 
the reduced chi square (x^), which is later used to select the best fit, for each combination 
were saved. The method of fitting all combinations of preliminary lines optimizes selection 
of the best fit to any absorption feature by maximally utilizing the information provided by 
the preliminary line list. 

However, in practice, xt is not a sensitive test of overfitting in selecting the best fit to 
a region (Ranch et al. 1992). For example, it is possible for a fit to produce a xl that is 
improved from the prior fit but where the errors in the equivalent width are larger than the 
equivalent width itself. In our application of this procedure, the central wavelengths are fairly 
well determined and do not change more than a few standard deviations with subsequent 
fits. We therefore imposed secondary constraints to ensure that the final fit consisted of 
meaningful line parameters. The final "best" fit is chosen as the fit with the lowest xl that 
also fulfills the following criteria: (a) xl ^ 100) (b) the errors in the fitted equivalent width 
and the FWHM are smaller than the measurement itself, (c) the minimum line separation 
for lines in a fit is AA = 1.12 A, and (d) 0.65 A < FWHM < 4.5 A. 




(1) 
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The first two of these constraints apply generally to the fitting procedure. The restriction 
on xl is an empirically determined practical limitation to the largest value expected to be 
associated with a reasonable fit. The best fit can have a xl that is rather high, simply due 
to the fact that there may be a portion of the fit region where no lines are fit but where 
the normalized fiux is not exactly at the continuum level. Failure of this criterion forces a 
redetermination of the fitted region by dividing it at the wavelength having the next highest 
intensity, and a refitting of the preliminary lines to obtain a fit having a lower xl- The 
restriction on the errors of fitted parameters to lines prevents over-fitting. 

The last two constraints depend on the range of SNR and resolution of the data being 
considered. In this case, the choices were justified as follows. The minimum and maximum 
FWHM values are determined by convolving the range of expected Doppler widths (Hu et 
al. 1995) with the instrumental profile, and allowing for variation in the minimum FWHM 
due to noise. The minimum line separation also depends on resolution and noise, and was 
chosen empirically by forming the distribution of line separations and then examining the 
reliability of individual fits. Instrumental resolution in this test case corresponds to 80 
km s~^. There is a natural break in the distribution at ~ 70 km s~^ and lines selected 
with smaller separations than this represent an over-fit with respect the features in the 
undegraded simulation spectrum. Lines with separations of more than ~ 90 km s~^ are 
reliably selected by the algorithm. For lines with separations between 70 and 90 km s~^ it 
is not clear by eye whether adding the second fine to the fit is justified, and in all cases the 
undegraded simulation spectra shows an asymmetry or two features close together. A Monte 
Carlo simulation to determine the recovery rate of two close lines shows that the software 
does not reliably recover the input lines until they are separated by ~ 100 km s~^. This 
motivates our minimum separation of 1.12 A. However, we note that the statistical results 
of the analysis are not particularly sensitive to this exact choice. 
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If the procedure fails one or more of the tests in any particular spectral region, the 
region is split at the next highest minimum intensity point and a new set of simultaneous 
fits is made using the smaller number of lines. If all of these fits fail, the last fit is chosen 
but is tagged. The tagged hues were examined for a subset of 200 spectra and were only 
3% of the total number of lines selected. Two thirds of these were moderately broad, weak 
features whose shape had been distorted with the addition of noise — in general, a line was 
found in the vicinity of the true feature even if the addition of noise and resolution made it 
more difficult to detect. Almost all of the hues from "failed" fits are weak enough that they 
will not meet the selection criteria to be included in the subsequent analysis. About one 
in 5 were strong broad features where the chosen fit appeared appropriate, but the xl was 
too high. A remaining tiny number, barely 0.01% of the whole sample, are weak features fit 
to the edge of broader absorption regions or are not real features at all. This same visual 
inspection of many of the fits showed that under-fitting or over-fitting is not a significant 
problem for the algorithms. We conclude that the fraction of strong regions of absorption 
that are not well fit by the line/continuum algorithm is negligible. 

The procedures for line-fitting are robust and can be applied in an automated way, but 
they are also somewhat complex. Figure 6 shows how they work in practice, as applied to 
the two extracted spectra from Figure 3, and in each of the four combinations of SNR = 10 
and 30, Tres= 80 and 300 km s~^. In other words. Figure 6 shows actual line/continuum 
fits to the four parameter choices at the lower left of Figure 3. Fitted continua and fines 
are overplotted in each case with the location of significant (Badet) absorbers shown by tick 
marks. "Significance" is defined in two ways: adet is related to the flux error and is used to 
describe the strength of a line in terms of the detection limit of the data; afu is the error in 
the equivalent width returned by the software and is a measure of line reliability or goodness 
of fit. In practice, adet can be used to set a uniform detection threshold for absorbers and 
is directly related to the SNR, where a fa is an indicator of how well the equivalent width 
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is determined — lines with large afu might be found in noisy regions of the spectrum or in 
the wings of broader absorption features. All hues fit by the software are retained regardless 
of strength or reliability, but for the analysis we define two categories: secure and marginal. 
The criterion for inclusion in either list is Scr^et- Lines that are < ha fa make up the marginal 
list, and lines that have an equivalent width > Su/jt constitute the secure list. Marginal lines 
may be reliable enough to be useful in the comparison with observed spectra, but in this 
work we only consider lines from the secure list in the analysis. In Figure 6, longer, thicker 
tick marks show the location of the secure lines, and shorter, narrower tick marks denote 
lines from the marginal category. 

4. PROPERTIES OF THE ABSORBERS 

4.1. Line Counts Along Single Sight-lines 

Absorption features were measured using the line-fitting algorithm for 300 primary lines 
of sight (PLCS) and for their adjacent sight-lines spanning separations from 33.3 to 1000 
h^Q kpc — a total of 2100 spectra extracted from the simulation. This was done for two 
different signal-to-noise ratios, SNR = 10 and 30, and two resolutions Tres= 80 and 300 
km s~^, forming four realizations of the data. We compared the distribution of absorber 
properties measured from the simulated spectra to those found from observations of large 
samples of spectra published in the literature as a basic cross-check of our method. 

In Figure 7, we show the distribution of the line centers of absorbers in 300 PLCS of the 
SNR — 30, Vres— 80 km s~^ realization for each of the three projection axes. The deviations 
from the mean (shown as the dotted line) indicate large scale, coherent structures within 
the simulation box, having filamentary and sheetlike properties. However, the average of 
the 3 projections converges toward the mean value indicating that the absorbers approach a 
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uniform distribution over the full size of simulation box, and demonstrating that there are no 
artifacts introduced by edge effects. Additionally, we examine the distribution of the number 
of absorbers per line of sight in Figure 8. This distribution is consistent with a Gaussian 
distribution that is the expectation for a random distribution of absorbers. In particular, 
the high multiplicity of absorbers that would indicate substantial clustering is not seen. 

Table 1 hsts the average number density (dN/dz) of secure hues per line of sight above 
a limiting rest equivalent width threshold (Badet) for each of the four realizations of the 
simulated spectra (column 5). For Sample 1 (column 1) we used 1000 FLOS for increased 
statistics on the highest SNR and resolution realization, and for the other three samples we 
used 300 FLOS. The number of lines from observations is obtained in two ways from a study 
performed by Scott, Bechtold, & Dobrzycki (2000). These values are hsted in columns 6 and 
7 of Table 1. First, we simply count the number of absorbers found in the small wavelength 
range spanned by the simulated spectra at z — 2 (J. Scott, private communication). However, 
a limited number of absorbers are found in this 23 A wide region and so the error bars are 
large. Since the evolution function (described below) is close to linear over 2 < z < 2.1, 
the Foisson error can be reduced by using the lines counted in this five times larger spectral 
region as an estimate for the smaller spectral region actually spanned by the simulation. 

Second, the average number of lines at ^ = 2 can be computed from coefficients of an 
evolution function fitted to an appropriate sample of observed absorbers. This method is 
less direct but it has the merit of incorporating most of the available data. The number of 
Lyman-Q! lines per unit redshift per unit equivalent width can be described as follows: 



Integrating Equation 2 with respect to W gives the number of lines per line of sight with a 
rest equivalent width greater than 0.16 A for the chosen values — 5.86, W* — 0.257, and 




(2) 
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7 = 2.42 from Scott, Bechtold, & Dobrzycki (2000): 

^ = Ao{l + zy, where Aq = Aq exp (^-^^ ■ (3) 
Further integration with respect to z gives the number of hnes per spectrum in the range 

^min Z <C. Z^nax- 

N=^[{1 + Zmax)^^^'^ - (1 + Zmin)^^^'^ . (4) 

7 + 1 

The number of hnes in this redshift interval for a different equivalent width limit, wum, is 
obtained by scahng by the factor exp — ^"''''^"'^^^ j . Errors on N are obtained using the 
la error bars on 7. See column 7 in Table 1. 

A comparison is made only for the rres= 80 km s"^ realizations (Samples 1 and 3), 
because this resolution closely matches the data set used by Scott, Bechtold, & Dobrzycki 
(2000). The error bars listed in Table 1 are merely reflect the Poisson error and do not 
include systematic uncertainties due to differences in the line-fitting algorithms that can be 
as large as 10-20%. Sample 3 most closely matches the sensitivity of the data set used by 
Scott, Bechtold, & Dobrzycki (2000), and the number of lines found in the extracted spectra 
agree to within 20% with both methods for counting observed hnes. Sample 1 in Table 1 
agrees with the (poorly-determined) number predicted using a model for the evolution of 
the absorbers, and it also agrees to within 20% of the expected number from the direct line 
counting method. 

In addition to a check of aggregate line counts, it is important to check whether extracted 
spectra from the simulation return the observed distribution of line strength. We therefore 
compare the number of lines per equivalent width bin, dN/dW, to a measurement that uses 
the data of Scott, Bechtold, & Dobrzycki (2000). This recent study provides better statistics 
at z — 2 than the Hubble Space Telescope Absorption Line Key Project (Bahcall et al. 1993; 
Weymann et al. 1998). Figures 9 and 10 are plots of the number of secure (> 5a fu) absorbers 
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counted in the spectra from the simulations, binned by rest equivalent width, for Samples 1 
and 3. Both distributions are well within the la error bars on 7, showing that the absorber 
statistics obtained by line-fitting applied to the simulation spectra are consistent with those 
from observed samples. Distributions formed that include the marginal lines (< 5a fu) show 
that the number of lines continues to increase in accordance with the exponential distribution 
down to 2.5a fit, and inspection of Figure 6 shows that these marginallines always correspond 
to a real feature in the "true" spectrum. We conclude that absorber line counting from this 
hydrodynamic simulation provides an excellent match to the demographics of the Lyman-a 
forest ai z — 2. 



4.2. Coincident Lines Between Sight-lines 

The first indications that Lyman-« absorbers were larger than the halos of individual 
galaxies came from the detection of coincident lines in the spectra of quasar pairs (Dinshaw 
et al. 1994; Bechtold et al. 1994). Most paired hues of sight experiments define absorbers 
pairs by choosing a maximum separation in velocity, typically 50-300 km s~^, that lines 
must have in order to be labeled as coincident and therefore physically associated. The 
operational definition of a matching "window" depends on the line density, and it is chosen 
to minimize the probability of a chance match. A "coincident" pair is two lines that match 
in velocity closely enough that the probability of a chance match is small. However, the 
preselection of a velocity match window can potentially lose information, and it makes an 
implicit assumption about the kinematic state of the absorbing gas (i.e. the amplitude of 
any velocity shear on that particular transverse scale). A very large number of paired lines 
of sight can be extracted from the simulations, with many potential absorber pairs at each 
of the transverse separations. We have designed a matching algorithm that does not depend 
on any prior definition of a velocity match. This allows an examination of how the coherence 
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of the absorbers changes over transverse separations of 33 h^^ kpc to 1 hr,^ Mpc. 

Coincident Unes are defined such that coincidences are symmetric. In other words, if 
there is a fine in sighthne A, its coincident fine is the nearest fine in velocity space in sighthne 
B. However, coincident fines are also selected starting witfi sigfitline B and matcfiing to 
sigfitline A. An absorber is not identified as a coincident line unless it is a reversible match 
both from A to B and from B to A. This procedure will of course result in lines that are 
unmatched in both sight-lines. We account for the fact that the simulation cube is periodic 
by matching across the ends of each simulation spectrum. To discuss line matching statistics, 
we have chosen to use the single realization of the simulation spectra with SNR — 30 and 
Tj-es— 80 km s~^. This most closely mimics the observational spectra of the three quasar 
pairs at 2; = 2 that we will analyze in Paper II. Absorber matches were found for 300 primary 
sight-lines — seven extracted spectra make up each set. The fraction of matched lines is 
87%, 73%, 65%, 61%, 59% and 58% for transverse separations 33, 100, 233, 400, 667, 1000 
h'^Q kpc, respectively. The fraction of matched lines for random lines of sight formed by 
pairing lines of sight from the x-projection with lines of sight from the z-projection is 57%. 

Figure 11 illustrates the line-matching procedure. Figure 11a is merely a test of the 
algorithm. Figure lib shows the cfi'cct of noise and subsequently slight differences in the 
degraded spectra on the scatter between the equivalent widths of matched lines. At a 
SNR — 30, only 3% of the fines fail to match due to fluctuations added by random noise. 
Figure 11c shows that there can be substantial differences in line strength even across a 
transverse scale of 33 h^Q kpc, which is below the resolution of the simulation where intrinsic 
differences must be small. Nevertheless, the formal correlation is highly significant. Figure 
lid shows the null experiment, where absorbers detected in 300 primary sight- lines are 
matched against 300 randomly selected sight-lines from an orthogonal projection of the 
simulation. This ensures that any two features at similar wavelengths are separated by 
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10-20 h^Q Mpc in space, and so are expected to be uncorrelated. 

In Figure 116, the scatter is larger than anticipated due to the addition of noise because 
the hne selection procedure must deal with issues of deblending and continuum fitting. 
The very few outliers (out of 924 line pairs) are mostly caused by the situation where the 
addition of noise to a marginally resolved feature causes it to be fit with two components 
in one sighthne but only one in the other. At first sight, the large scatter in Figure 11c 
is more surprising, because the transverse separation is at the limit of resolution of the 
simulation. The scatter comes about because the mapping from real space to spectral space 
in complex. Peculiar gas motions affect the optical depth both by shifting line centers and 
by changing line profiles due to velocity gradients (Bi & Davidsen 1997). The fact that our 
line fitting software is responding to real physical differences in opacity can also be seen 
from the analytic modelling of Viel et al. (2001). They show calculated Lyman-aspectra 
at z — 2.15 with noticeable differences on transverse scales as small as 60 h^Q kpc, where a 
significant fraction of lines fit with Voigt profiles have column densities that differ by more 
than a factor of two. Taken together with the added effects of noise, the scatter in Figure 11c 
can be readily understood. 

A major result of this paper is shown in Figure 12. This shows matched or coincident 
lines plotted against the velocity difference of the line pair for the six transverse separations 
with respect to the PLOS. (Of course, it is possible to generate many transverse separations 
between and 1000 h'^^ kpc using these spacings, but the choices in Figure 12 illustrate 
the coherence phenomenon adequately.) Given the mean line spacing of 850 km s~^ (for 
a 5(7 hne), the hne matching algorithm becomes aliased at 425 km s~^, so matches with 
separations above this value are not physically meaningful. 

The most sensitive test of absorber coherence uses the fact that truly coincident hues 
will have a small velocity separation between sight-lines, where random matches (not phys- 
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ical associations) will have a larger range of velocity differences. Figure 12 suppresses the 
information on homogeneity by plotting the average of the equivalent widths of the paired 
absorbers against the velocity splitting of the pair. The distribution of the velocity splittings 
of matched pairs changes quite dramatically with transverse separation. To quantify this, 
we plot the cumulative distribution of the velocity splittings for the paired absorbers in Fig- 
ure 13. To show the expectation for a random set of absorbers with no physical association 
between the sight-lines, we also form the cumulative distribution for pairings between spec- 
tra extracted from orthogonal projections through the simulation volume. Any two paired 
absorbers in this case will have a physical separation of ~10-20 h^^ Mpc and so are not 
anticipated to show coherence (the dashed lines in Figure 13). A K-S test demonstrates 
that there is an excess of small velocity sphttings, and therefore detectable coherence, up to 
about 500 h^Q kpc. 

A typical method of examining the homogeneity of Lyman-a absorbers over various 
transverse separations is to plot the equivalent widths of each line in a matched pair. How- 
ever, noise and intrinsic differences in the absorbing regions imprint themselves on a spectrum 
and affect the line fitting algorithm in a complex way, so that apparently similar spectra can 
yield coincident line pairs with substantially different equivalent widths. 

To see if line matching depends on line strength, we formed distributions of equivalent 
widths for both the paired lines as well as the unpaired lines. A K-S test shows these 
distributions are drawn from the same distribution. Even though there are many more 
weak (lower column density) lines than strong (higher column density) lines, the degree of 
coherence on any particular transverse scale does not depend on line strength. However, the 
fraction of matched lines decreases with increasing transverse scale; a clear demonstration 
that coherence is being lost as the scale approaches 1 h^^ Mpc. 
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4.3. Compcirison with Continuous Statistics 

Up until now we have been examining the coherence of the Lyman-a absorbers using 
the highest density peaks that are fitted as hnes. Clustering or coherence may also be 
analyzed through the use of continuous statistics that leverage the flux information in every 
independent point of the spectra. Various types of correlation measures are well described 
in Cen et al. (1998). We define the two-point correlation function (TPCF) of the normalized 
absorbed flux, F{v), along neighboring lines of sight as 

where Fi and F2 refer to fluxes along the two hnes of sight. 

Figure 14 shows the TPCF for each of the transverse separations 33, 100, 233, 400, 667, 
and 1000 h^Q kpc. We compute the auto-correlation function, which is the PLCS correlated 
with itself (Fi = F2 in Equation 5), and this is shown as the curve with the highest amplitude. 
We also compute the TPCF for hnes of sight that are randomly associated by correlating 
the PLCS from spectra extracted from the x-projection with PLCS from the z-projection. 

The TPCF for discrete absorbers in a line-counting experiment is computed for all 
transverse separations by comparing the number of observed nearest neighbor pairs, Ng^s, 
with the number of pairs formed from random lines of sight, Nran as follows: 

Cpair = 1- (6) 

-' 'ran 

The random lines of sight are the PLOS from the x- and z- projections. Figure 14 shows the 
correlation functions for each of the six transverse separations. The amplitude of the TPCF 
for each method decreases to approach the curve for random demonstrating that coherence 
can be measured using line-fitting methods. In Paper II, we will illustrate the different 
types of information that are gleaned by using continuous flux statistics and line-counting 
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techniques, and we will quantify the amount of data required to detect coherence using line 
counting. 

5. SUMMARY 

This paper has established techniques for the intercomparison of quasar spectra and 
spectra extracted from cosmological simulations, recognizing that observations are tradi- 
tionally interpreted in terms of line-counting, while simulations offer a direct measure of 
neutral hydrogen opacity at every resolution element. This work anticipates a series of di- 
rect comparisons using space- and ground-based observations of quasar pairs and groups. 
Initial tests have been carried out on the Lyman-a forest at 2; = 2, with simulation spectra 
degraded to span the range of observational quality of typical ground and space-based data. 
The main results are as follows: 

(1) Simulators measure opacity with respect to a pre-defined continuum, while observers 
must determine a quasar continuum that is not known a priori, in the presence of noise and 
emission features. Software has been designed that robustly measures continua both for 
the small wavelength (redshift) range of simulations and for the long wavelength range of 
typical observations. The systematic underestimate of opacity ranges from Ar = 0.02 for 
SNR = 100, Tres= 5 km s'^ to Ar = 0.06 for SNR = 10, Tres= 300 km s-\ 

(2) A fully automated procedure has been developed for the detection and selection of 
absorption features that produces reliable results for observational data having SNR = 10 
to 30 and Tj-es— 80 to 300 km s~^. The techniques are robust even in extended regions 
of strong absorption. A direct comparison with published quasar surveys shows that the 
number density and equivalent width distribution of absorption lines ai z = 2 agrees with 
measurements of spectra extracted from an SPH simulation. 
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(3) A technique has been estabhshed to match absorption hnes between adjacent hnes 
of sight. The match rate of coincident hnes is used to search for coherence in the absorbing 
gas on transverse scales up to 1 h'^Q Mpc. The two-point correlation function of matched 
pairs reveals a significant excess out to transverse scales of ~ 500 h'^^ kpc, indicating the 
detection of Lyman-a coherence on this scale. Most of the signal of excess pairs occurs with 
velocity splittings of < 100 km s~^, indicating that the velocity field is quiet. The coherence 
signal measured with line counting and matching techniques agrees well with results from a 
two-point flux correlation analysis, which uses all the information in the simulation spectra. 

We are particularly grateful to Tom Aldcroft, who made available the code that formed 
the core of the software suite described in this paper, and Jennifer Scott, who shared her 
thesis data. We acknowledge useful discussions with Romeel Dave, Craig Foltz, Jane Charl- 
ton, and Rupert Croft. This work was supported by NASA Astrophysical Theory Grants 
NAG5-3922, NAG5-3820, and NAG5-3111, by NASA Long-Term Space Astrophysics Grant 
NAG5-3525, and by the NSF under grants ASC93-18185, AST-9803072, and AST-9802568. 
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Table 1. Comparison of Number Density of Lyman-a Absorbers 



Sample 
(1) 


SNR 
(2) 


r 

res 

(km s~^) 
(3) 


(A) 

(4) 


\ dz J Sim 

(5) 


(dN\ c 
V dz ) obs 

(6) 


(dN\ d 
V dz ) obs 

(7) 


1 

2 
3 
4 


30 
30 
10 
10 


80 
300 

80 
300 


0.05 
0.05 
0.17 
0.17 


170 ±3 
75 ±4 
92 ±4 
38 ±3 


143 ± 11 
108 ±2 





^Resolution of the degraded simulation spectra as described by the 

FWHM of a Gaussian distribution in km s~^. 

^The limiting rest equivalent width for an isolated hudet absorber. 
'^The number of absorbers per unit redshift obtained by counting the 

number of lines in the sample of Scott, Bechtold, & Dobrzycki (2000) 
in the redshift range 2 < ^ < 2.1 with ba^it- The errors are la Poisson 
errors; systematic uncertainties are likely to be larger. 

'^The number of absorbers per unit redshift computed from the ab- 
sorber evolution model of Scott, Bechtold, & Dobrzycki (2000). The 
errors correspond to the la errors on 7, where the integrated evolution 
function is specified by Equation 4 in the text. 
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Fig. 1. — A schematic view of the range in SNR and resolution of the major examples of 
quasar spectra from pioneering observational facilities. Open boxes refer to ground-based 
observations of Lyman-a with z > 1.6; shaded boxes refer to the unique capabilities of HST 
at lower redshifts. The horizontal bar shows the (redshift-independent) range of Doppler 
parameters, as measured with echelle data. The dotted lines show the detection hmits for 
a 5adet line having hydrogen column density as labeled, logiV = 15.0, 14.5, 14.0, 13.3, 12.8 
atoms cm~^. The detection limits are computed assuming the spectral dispersion is one-third 
of the instrumental resolution. 
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Fig. 2. — Examples of undegraded (i.e. raw) spectra extracted from the simulation. The 
mean transmitted flux at 2; = 2 is 0.83; panels (c) and (0?) have a typical amount of absorption. 
The mean opacities from panels (a) to (/) are 0.06, 0.07, 0.13, 0.15, 0.34, 1.06. The percent 

of simulated spectra with opacity less than that shown in panels (a) through (/) is 1%, 6%, 
37%, 47%, 95%, and 99%, respectively. 
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Fig. 3. — Two spectra extracted from the simulation, each of typical opacity ai z = 2. 
The twelve panels in each case show the spectrum degraded to SNR = 10, 30, and 100, and 
Tres— 5, 20, 80, and 300 km s~^. The top right panel is a reasonable facsimile of the original 
spectrum. 
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Fig. 4. — An illustration of how the underestimate of opacity due to continuum fitting 
depends on both SNR and resolution. The mean opacity from each undegraded spectrum is 
plotted against the mean opacity measured relative to the fitted continuum for each degraded 
spectrum. Each panel represents a different combination of SNR and resolution; the layout 
is the same as Figure 3. 
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Fig. 5. — The opacity decrement from the continuum fitting procedure apphed to 1000 
spectra extracted from the simulation. The quantity At is the difference between the con- 
tinuum level of an input spectrum (raw, undegraded) and the fitted continuum to a degraded 
spectrum, converted into opacity. 
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Fig. 6. — An illustration of the line-fitting procedure for the two extracted spectra from 
Figure 3, in each of the realizations SNR — 10 and 30, Tres— 80 and 300 km s~^. Fitted 
continuum and superimposed lines are shown by a dashed line, and line centers of absorbers 
with strength of at least 5adet are shown by tick marks. Longer thicker ticks denote lines 
with secure reliability > 5a fu and shorter thinner ticks mark lines with marginal reliability 
of < afit. 
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Fig. 7. — The distribution of line centers for absorbers found in 300 PLOS for the SNR — 30, 
rres= 80 km reaUzation for the x-, y-, and z- projections. The dotted hne shows the 
mean of the distribution. The individual projections reveal coherent, large-scale structures. 
The average of the individual projections {lower right) shows that the distribution converges 
toward a uniform distribution as more lines of sight from all projections are combined. 
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Fig. 8. — The distribution of the number of absorbers per hne of sight for 300 PLOS for the 
SNR = 30, Tres= 80 km s~^ reahzation for the x-, y-, and z- projections. The dotted hne is 
a Gaussian distribution with the mean and standard deviation of the data. The good match 
to the Gaussian demonstrates that the number of absorbers per hne of sight is consistent 
with a random distribution of absorbers on scales similar to the size of the simulation box. 
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Fig. 9. — (a) A histogram of the number of 5a fu absorbers as a function of rest equivalent 
width, measured by hne-fitting apphed to 300 spectra extracted from the simulation degraded 
to SNR = 10, Tres= 80 km s~^. The solid line is the best-fit evolution function to the 
observed data at z — 2 from Scott, Bechtold, & Dobrzycki (2000), with dashed lines showing 
1(7 error bars due to the parameter 7. (b) Expanded view of the distribution of weak line 
strengths, with detection significance superimposed. 
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Fig. 10. — (a) A histogram of the number of Bafu absorbers as a function of rest equivalent 
width, measured by hne-fitting apphed to 1000 spectra extracted from the simulation de- 
graded to SNR = 30, Tres= 80 km s^^. As in Figure 9, the solid line is the best-fit evolution 
function to observed data at ^ = 2 from Scott, Bechtold, & Dobrzycki (2000), with dashed 
lines showing la error bars due to the parameter 7. (b) Expanded view of the distribution 
of weak line strengths, with detection significance superimposed. 
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Fig. 11. — A plot of the rest equivalent widths of paired absorbers for the sample where 
SNR — 30 and Tres— 80 km s~^. (a) 300 primary lines of sight are matched to themselves 
as a test of the algorithm, so the correlation is perfect, (b) 300 primary lines of sight are 
matched to themselves, but with a different noise seed, (c) Matched pairs found for the closest 
transverse separation of 33 h'^Q kpc, which approaches the resolution of the simulation, so 
intrinsic differences in the spectra should be small, (d) 300 primary lines of sight are matched 
to 300 randomly-selected lines of sight from an orthogonal projection of the simulation. No 
pairs due to physical coherence are expected in this situation. 
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Fig. 12. — The average rest equivalent width for matched absorber pairs plotted against the 
velocity sphtting of the pair for 300 sets of lines of sight at 6 transverse separations. The 
fraction of matched lines goes down from 87% for a separation of 33 h'^Q kpc to 58% for a 
separation of 1000 h^Q kpc. The fraction of matched lines for random lines of sight is 57%. 
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Fig. 13. — The cumulative distribution of velocity splittings for absorber pairs at each 
transverse separation in the sample (300 sets) is shown by the sohd hne. The dashed hne is 
the cumulative distribution of absorber pairs formed by pairing x-projection primary lines 
of sight with z-projection primary lines of sight, approximating the distribution expected for 
random absorber pairs. 
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Fig. 14. — (a) The two-point correlation function (TPCF) of the transmitted flux for spectra 
having transverse separations of 33, 100, 233, 400, 667 and 1000 h^Q kpc, which is the 
sequence from the highest to the lowest curve at Ai> = 0. The curve with the highest 
amphtude is the auto-correlation formed from correlating the PLOS with themselves. The 
TPCF for random lines of sight is formed by correlating the PLOS of the x-projection with 
the PLOS of the z-projection. (h) The TPCF for discrete absorber pairs. The curve with 
the highest amplitude is for a transverse separation of 33 h^^ kpc. Curves with subsequently 
lower amplitudes are for separations of 100, 233, 400, 667 and 1000 h^^ kpc. 



